Source Language Adaptation Approaches for Resource-Poor Machine Translation
نویسندگان
چکیده
Most of the world languages are resource-poor for statistical machine translation; still, many of them are actually related to some resource-rich language. Thus, we propose three novel, language-independent approaches to source language adaptation for resource-poor statistical machine translation. Specifically, we build improved statistical machine translation models from a resource-poor language POOR into a target language TGT by adapting and using a large bitext for a related resource-rich language RICH and the same target language TGT. We assume a small POOR–TGT bitext from which we learn word-level and phrase-level paraphrases and cross-lingual morphological variants between the resource-rich and the resource-poor language. Our work is of importance for resource-poor machine translation because it can provide a useful guideline for people building machine translation systems for resource-poor languages. Our experiments for Indonesian/Malay–English translation show that using the large adapted resource-rich bitext yields 7.26 BLEU points of improvement over the unadapted one and 3.09 BLEU points over the original small bitext. Moreover, combining the small POOR–TGT bitext with the adapted bitext outperforms the corresponding combinations with the unadapted bitext by 1.93–3.25 BLEU points. We also demonstrate the applicability of our approaches to other languages and domains.
منابع مشابه
Improved Statistical Machine Translation for Resource-Poor Languages Using Related Resource-Rich Languages
We propose a novel language-independent approach for improving statistical machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resourcepoor source language X1 into a resourcerich language Y given a bi-text containing a limited number of parallel sentences for X1-Y and a larger bi-text for X2-Y fo...
متن کاملSource Language Adaptation for Resource-Poor Machine Translation
We propose a novel, language-independent approach for improving machine translation from a resource-poor language to X by adapting a large bi-text for a related resource-rich language and X (the same target language). We assume a small bi-text for the resourcepoor language to X pair, which we use to learn word-level and phrase-level paraphrases and cross-lingual morphological variants between t...
متن کاملLeveraging Diverse Sources in Statistical Machine Translation
Statistical machine translation is often faced with the problem of having insufficient training data for many language pairs. In this thesis, several methods have been proposed to leverage other sources to enhance the quality of machine translation systems. Particularly, we propose approaches suitable in these four scenarios: 1. when an additional parallel corpus between the source and the targ...
متن کاملAn Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation
In this paper, we compare two simple domain adaptation methods for neural machine translation (NMT): (1) We append an artificial token to the source sentences of two parallel corpora (different domains and one of them is resource scarce) to indicate the domain and then mix them to learn a multi domain NMT model; (2) We learn a NMT model on the resource rich domain corpus and then fine tune it u...
متن کاملImproving Statistical Machine Translation for a Resource-Poor Language Using Related Resource-Rich Languages
We propose a novel language-independent approach for improving machine translation for resource-poor languages by exploiting their similarity to resource-rich ones. More precisely, we improve the translation from a resource-poor source language X1 into a resourcerich language Y given a bi-text containing a limited number of parallel sentences for X1-Y and a larger bi-text for X2-Y for some reso...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computational Linguistics
دوره 42 شماره
صفحات -
تاریخ انتشار 2016